Unsupervised Spoken Keyword Spotting and Learning of Acoustically Meaningful Units

نویسنده

  • Yaodong Zhang
چکیده

The problem of keyword spotting in audio data has been explored for many years. Typically researchers use supervised methods to train statistical models to detect keyword instances. However, such supervised methods require large quantities of annotated data that is unlikely to be available for the majority of languages in the world. This thesis addresses this lack-of-annotation problem and presents two completely unsupervised spoken keyword spotting systems that do not require any transcribed data. In the first system, a Gaussian Mixture Model is trained to label speech frames with a Gaussian posteriorgram, without any transcription information. Given several spoken samples of a keyword, a segmental dynamic time warping is used to compare the Gaussian posteriorgrams between keyword samples and test utterances. The keyword detection result is then obtained by ranking the distortion scores of all the test utterances. In the second system, to avoid the need for spoken samples, a Joint-Multigram model is used to build a mapping from the keyword text samples to the Gaussian component indices. A keyword instance in the test data can be detected by calculating the similarity score of the Gaussian component index sequences between keyword samples and test utterances. The proposed two systems are evaluated on the TIMIT and MIT Lecture corpus. The result demonstrates the viability and effectiveness of the two systems. Furthermore, encouraged by the success of using unsupervised methods to perform keyword spotting, we present some preliminary investigation on the unsupervised detection of acoustically meaningful units in speech. Thesis Supervisor: James R. Glass Title: Principle Research Scientist

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Piecewise Aggregate Approximation Lower-Bound Estimate for Posteriorgram-Based Dynamic Time Warping

In this paper, we propose a novel lower-bound estimate for dynamic time warping (DTW) methods that use an inner product distance on multi-dimensional posterior probability vectors known as posteriorgrams. Compared to our previous work, the new lower-bound estimate uses piecewise aggregate approximation (PAA) to reduce the time required for calculating the lower-bound estimate. We describe the P...

متن کامل

Spoken Web Search using an Ergodic Hidden Markov Model of Speech

An ergodic hidden Markov model (EHMM) of speech can be trained in an unsupervised manner using unlabeled speech. A keyword spotting system has been developed where the queries and test observations are represented as sequences of states of the EHMM. A graphical keyword model is built by aggregating multiple instances of a query or by using mappings between phonemes and states of the EHMM. A mod...

متن کامل

Morphological Segmentation for Keyword Spotting

• We explore the impact of morphological segmentation on Keyword Spotting (KWS). ! • Handling out-of-vocabulary (OOV) words is a major challenge in KWS we aim to alleviate this problem by utilizing sub-word units.! • We augment a state-of-the-art KWS system with subword units derived from supervised and unsupervised morphological segmentations, and compare with phonetic and syllabic segmentatio...

متن کامل

A comparison of grapheme and phoneme-based units for Spanish spoken term detection

The ever-increasing volume of audio data available online through the world wide web means that automatic methods for indexing and search are becoming essential. Hidden Markov model (HMM) keyword spotting and lattice search techniques are the two most common approaches used by such systems. In keyword spotting, models or templates are defined for each search term prior to accessing the speech a...

متن کامل

Confidence Measure for Utterance Verification in Keyword Spotting System

In this article, we propose an utterance verification technique for keyword spotting. The keyword spotting system analyzes a given spoken content and searches every speech segment in which one of pre-defined keywords is uttered. To maintain a stable recognition performance in the system, we propose an utterance verification technique that verifies whether a found utterance, or a candidate keywo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009